Counting gaps per sequence

We have several different ways of counting sequence gaps, and of visualising the results. By default, the count_gaps_per_seq() method returns a matrix of counts without the ability to visualise the results. When setting the argument unique=True, the counts are for gaps uniquely induced by each sequence. This can be a useful indicator of highly divergent sequences.

from cogent3 import load_aligned_seqs


aln = load_aligned_seqs('../../data/brca1.fasta', moltype='dna')

counts = aln.count_gaps_per_seq(unique=True)
counts
FlyingFox DogFaced FreeTaile LittleBro TombBat RoundEare FalseVamp
0 0 0 0 0 0 0
FlyingFox LeafNose Horse Rhino Pangolin Cat Dog Llama Pig Cow Hippo
0 0 0 0 0 0 0 0 0 3 0
FlyingFox SpermWhale HumpbackW Mole Hedgehog TreeShrew FlyingLem Galago
0 0 0 0 0 3 0 3
FlyingFox HowlerMon Rhesus Orangutan Gorilla Human Chimpanzee Jackrabbit
0 21 0 0 0 0 0 0
FlyingFox FlyingSqu OldWorld Mouse Rat NineBande HairyArma Anteater
0 57 0 0 0 0 0 0
FlyingFox Sloth Dugong Manatee AfricanEl AsianElep RockHyrax TreeHyrax
0 0 0 0 0 0 0 0
FlyingFox Aardvark GoldenMol Madagascar Tenrec LesserEle GiantElep
0 0 0 0 6 0 6
FlyingFox Caenolest Phascogale Wombat Bandicoot
0 0 0 0 0


Plotting counts of unique gaps

There are three plot types supported. In all cases, placing the mouse pointer over a data point will show hover text with the sequence name.

Displaying unique gaps as a bar chart

counts = aln.count_gaps_per_seq(unique=True, drawable='bar')
counts.show(width=500)

Displaying unique gaps as a violin plot

counts = aln.count_gaps_per_seq(unique=True, drawable='violin')
counts.show(width=300, height=500)

Displaying unique gaps as a box plot

counts = aln.count_gaps_per_seq(unique=True, drawable='box')
counts.show(width=300, height=500)

Total running time of the script: ( 0 minutes 2.134 seconds)

Gallery generated by Sphinx-Gallery